Enabling Scientists to Understand their Data using Web-Based Statistical Tools

Eric Hare

Enabling Scientists to Understand their Data using Web-Based Statistical Tools

Eric Hare
Iowa State University
September 23rd, 2016

Background

About Me

Publications

Awards

Dissertation Outline

Working Title: Enabling Scientists to Understand their Data using Web-Based Statistical Tools

Papers:

  1. Automatic Matching of Bullet Lands (Accepted with Revisions in AoAS)
  2. Designing Modular Software: A Case Study in Introductory Statistics (Accepted with Revisions in JCGS)
  3. Visual Inference (In Progress)

Common Themes

  1. Reproducible Research
  2. Interactive Graphics
  3. Exploratory Data Analysis
  4. Three-pronged approach:
    1. Statistical Component (Algorithm, optimization, lineup protocol)
    2. Data Science Component (Open-source R packages)
    3. Web-Based Component (Interface for non-developers)

Dissertation Chapter One

Automatic Matching of Bullet Lands

Eric Hare, Heike Hofmann, Alicia Carriquiry
Center for Statistics and Applications in Forensic Evidence (CSAFE)

Goal

Current Practice

The problems culminated in a 2009 NAS report which found, among other things, “much forensic evidence – including, for example, bite marks and firearm and toolmark identification is introduced in criminal trials without any meaningful scientific validation, determination of error rates, or reliability testing to explain the limits of the discipline.”

James Hamby Study

plot3D.x3p.file(read.x3p("~/GitHub/imaging-paper/
app/images/Hamby252_3DX3P1of2/Br1 Bullet 1-5.x3p"),
plot.type = "surface")

Data Format

Step One: Extract a Profile

We need to choose a location (height) of the bullet at which to extract a profile. To do so, we optimize the CCF (Vorburger, 2011):

  1. Extract a profile near the base of the bullet, call this value d0.
  2. Take a fixed step d and extract at d + d0.
  3. Check the maximum cross correleation (CCF) between the signature at d0 and at d + d0.
  4. If this CCF exceeds a threshold c, choose d0 as the signature
  5. Otherwise, repeat steps 2 to 4 for d, 2d, 3d, … until the threshold is achieved.
  6. If the threshold is not achieved, flag the land for further investigation.

Parameters: d = 25μm, d0 = 25μm, c = 0.9

Step One (Continued)

br111 <- get_crosscut("images/Br1 Bullet 1-5.x3p", x = 243.75)

qplot(y, value, data = br111) + theme_bw()

Step Two: Remove Shoulders

The striations that identify a bullet to a gun barrel are located in the land impression areas (Xie 2009).

  1. At a fixed height x extract a bullet’s profile (previous figure, with x = 243.75μm).
  2. For each y value, smooth out any deviations occurring near the minima by applying a rolling average with a pre-set s.
  3. For each smoothed y value, compute another rolling average using the same smoothing factor s as above.
  4. Determine the location of the peak of the shoulders by finding the first and last doubly-smoothed value yi that is the maximum within its smoothing window.

Parameters: s = 35μm

Identifying Shoulders (Easy)

br111.groove <- get_grooves(br111)
br111.groove$plot

Identifying Shoulders (Challenging)

result2 <- get_grooves(get_crosscut("~/GitHub/imaging-paper/app/images/Hamby252_3DX3P1of2/Br1 Bullet 1-6.x3p"))
result2$plot

Step Three: Fit Loess Regression

Local weighted scatterplot smoothing (Cleveland, 1979) - Fits a low-degree polynomial to a small subset of the data, weighting values near the point to be estimated more strongly.

br111.loess <- fit_loess(br111, br111.groove)
br111.loess$fitted

Step Four: Get the Residuals

Deviations from the loess fit should represent the imperfections (striations) on the bullet. Hence, we extract the residuals from the model.

br111.loess$resid

Step Five: Peaks and Valleys

As with detecting the shoulders, we can smooth the deviations and compute derivatives to identify peaks and valleys in the signature.

br111.peaks <- get_peaks(br111.loess$data)
br111.peaks$plot

Step Six: Bullet Alignment

The previous five steps are performed for each bullet land. But now we wish to extract features for cross comparisons of bullet lands.

Step Six (Continued)

Step Six: Extract Features

Features are extracted from each land-to-land comparison:

Distribution of Features

Step Seven: Random Forest

Feature Importance

Web Application

https://erichare.shinyapps.io/x3prplus

The web-deployed version of the bullets application

The web-deployed version of the bullets application

Future Work

Dissertation Chapter Two

Designing Modular Software: A Case Study in Introductory Statistics

Dissertation Chapter Three

Visual Inference

User-facing Side

A modernization of the framework to run lineup experiments and identify the most different plot in a set of null plots is in development.

An example lineup

An example lineup

Experimenter-facing Side

In the early stages is a new service to enable researchers to automatically conduct and run lineup experiments, while allowing for the specification of the randomization scheme, stratification factors, and parameters of the study

A prototype of the admin user interface for the lineups app

A prototype of the admin user interface for the lineups app

Timeline

Deliverables Timeline

My timeline for graduation

My timeline for graduation

Thank You

Special thanks to Alan Zheng at the National Institute of Standards and Technology for maintaining the NIST Ballistics Toolmark Research Database and providing many useful suggestions for our algorithm.

Any Questions?